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Abstract 

We introduce a modified model of random walk, and then develop 
two novel clustering algorithms based on it. In the algorithms, each data 
point in a dataset is considered as a particle which can move at random 
in space according to the preset rules in the modified model. Further, this 
data point may be also viewed as a local control subsystem, in which the 
controller adjusts its transition probability vector in terms of the feed- 
backs of all data points, and then its transition direction is identified by 
an event-generating function. Finally, the positions of all data points are 
updated. As they move in space, data points collect gradually and some 
separating parts emerge among them automatically. As a consequence, 
data points that belong to the same class are located at a same position, 
whereas those that belong to different classes are away from one another. 
Moreover, the experimental results have demonstrated that data points in 
the test datasets are clustered reasonably and efficiently, and the compar- 
ison with other algorithms also provides an indication of the effectiveness 
of the proposed algorithms. 

Keywords: Multi-particle systems; Self-organization; Data clustering; 
Random walks 



1 Introduction 

Data clustering is a widely investigated problem in Pattern Recognition. For 
the past forty years, a lot of excellent algorithms for clustering have been pre- 
sented from those that put the emphasis on cluster centers and boundaries, say, 
K -means jlj, support vector clustering (SVC) |2j, to current particle swarm op- 
timization (PSO) based [3], ant-based [4], and flocking-based [5] algorithms for 
clustering. Observing the history of clustering algorithms, we can notice that a 
significant change has been made, which may be considered as two stages. First, 
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with fixed data points, we utilized various functions to find complex curve planes 
in order to cluster or classify data points; second, till the past few years, some pi- 
oneers thought about that why not those data points could move in themselves, 
just like agents or whatever, and collect together automatically. Therefore, fol- 
lowing their ideas, they create a few exciting algorithms [21H1IS], in which data 
points moves in a whole space according to certain simple local rules preset in 
advance. 

In addition, a random walk is a special class of stochastic processes [B] , which 
can be simply described in this way. Assume a particle walks on a straight line 
who either takes one step to the right with probability p or one step to the left 
with probability g = 1 — p at a time. As a result, the position sequence that is 
produced by the particle moving is defined as a random walk on a line. Likewise, 
given a graph and a particle located at one of its vertexes, this particle visits 
one of its neighbor vertexes at random, and then from this vertex it selects next 
vertex randomly again. Finally, the sequence of vertexes visited by the particle 
is defined as a random walk on a graph [7]. Besides, random walks in higher 
dimensional space and their variations have also been studied, which bring more 
complex behavior. 

In this paper, we propose a modified model of random walk, and develop two 
clustering algorithms based on it. Furthermore, in our algorithms, data points in 
a dataset are considered as particles that can walk in space at random. Further, 
each data point can also be viewed as a local control subsystem, whose controller 
controls its walking behavior. After taking a step to one of its neighbors, its 
position will be updated. As data points moves in space at random according 
to the rules of the modified model, they gather together gradually, and finally 
form some clusters automatically. The remainder of this paper is organized 
as follows: Section 2 reviews some related work about random walks briefly, 
and explains our motivation. In Section 3 the modified model of random walk 
is introduced specifically. In Section 4 the convergence of modified model of 
random walk is discussed. In Section 5 firstly two clustering algorithms based on 
the model are elaborated, and then they are analyzed in detail. Next, the effects 
of some important parameters are discussed. In section 6 experimental results 
of algorithms are demonstrated. Finally, the conclusion is given in Section 7. 

2 Related work 

Our work about dynamical clustering is inspired by Cui et al. [5], who present a 
flocking based algorithm for document clustering. In their algorithm, each doc- 
ument vector is mapped as a bold into a two-dimensional virtual space firstly. 
By means of four rules, boids with similar features move, collect together au- 
tomatically and establish a flock, while flocks with different features keep away 
from one another. 

In the last ten years, the methods related to random walk have been wildly 
appfied in all kinds of fields, such as computer science [H O [TOl [11] , physics [12], 
and biology [13j . Especially, in the algorithmic theory, some approximate algo- 



2 



rithms based on random walk have been presented to solve NP-hard problems. 

On the other hand, although random walks on graphs [7] as a theory have 
been investigated by mathematicians for some years, the idea does not be ap- 
plied to the domain of pattern recognition by some researchers until recent years. 
For instance, Luh Yen et al. [S] gave a random walk based distance measure that 
is applied in the fc-means algorithm as a new distance measure. David Harel 
et al. studied an algorithm based on deterministic exploration of random 
walks on a weighted graph. The similarities between data points were computed 
based on the A;-th power of transition probability matrix. Consequently, edges 
with similarities zero or approaching zero were separated by separating opera- 
tors. Thus, each non-connected subgraph represented a cluster of spatial data. 
Later, Giines Erkan [10] extended the model similar to David Harel's to the di- 
rected graph case, and introduced a language model-based document clustering 
algorithm as an application of this model. 

In their algorithms, however, the vertexes of the graph (data points) are 
located at some positions fixedly, so the shape of graph which represents the 
connections among data points is almost unchanged. Unlike their methods, 
data points in our algorithms are regarded as not only vertexes of a graph but 
also particles which can move in the whole space according to the rules of the 
modified model, so that the shape of the graph that they construct is changed 
over time. As data points walk constantly at random in space, as a consequence, 
the clusters are established automatically in the process. 



3 Modified model of random walk 

Assume a set X with N particles, X — {Xi, X2, • ■ ■ ,Xn}, in which Xi is 
the position of a particle located in an m-dimensional metric space. From the 
point of view of control theory, the system composed of N particles which walk 
randomly in space may be described by the below block diagram Fig. [TJ 



TKtiisjticn protabiLity mairiji ^ 



Posiricti maciinX 



Figure 1: Block diagram of the system. 



As is shown in Fig. 1, the controlled object is the Transition Probability 
matrix P, and the outputs of the system are the new positions of all particles in 
the system. The controller C adjusts the entries in the Transition Probability 
matrix P according to the current positions of N particles, and then decides 
the transition directions and transition distances of N particles at the next 
moment. Finally, the positions of N particles are updated synchronously. So, 
the equation of motion of the whole system and the output equation are written 
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as follows: 

P{t + l)nxn = [Pi{t + l)lxn]i=l,2,--- .N = bnxn ^{bnxn X (l)nxn) 
bnxn = iL{t+l)nxn»iK{t+ l)„xl X (1) 1 x n) (^(0)n X 1 X (l)lxn)) 
nxn X {K[t + l)nxl ®'-^^(0)nxl) X (l)lxn) 
nxn J 

X{t + l)„xm = [Xi{t + l)lxm]i=l,2,--- = X{t)nxm + {{{Eve{t + l)nxn 

nxn 

^(^(t + 1) 

nxn ) X (l)„xl) X (l)lxm) 
8((X(t)„xm - Eve{t + l)nxn X X(f)„xm) 

(1) 

where the matrix D{t + l)„xn and i(f + l)„xn represent a distance matrix 
and an adjacent matrix of N particles respectively; Eve(t + l)„xn is an event 
matrix, which indicates the transition directions of particles in the system; K{t+ 
l)nxi denotes the degree vector, each of which describes the number of particles 
within the neighborhood of a particle Xi. Besides, we have also defined two 
new matrix operations which are expressed by Eq. 2: (a) the multiplication of 
corresponding entries of two matrixes represented by symbol 'st', and (b) the 
division of corresponding entries of two matrixes represented by symbol 

I an X 6n ■ ■ • ain x h^r. 



A-nxn i'Bnxn — 



V On X 6„i • • • a„„ X bnn / , , 

(2) 

aii/hii ■■■ ain/bin 



aii/b 

Til ' ' * ^nn / ^nn 



Further, if each particle Xi is viewed as a local control subsystem, the 
whole system mentioned previously can be redrawn as a new distributed system 
composed of N local control subsystems, shown in Fig. [21 where the output of 
each subsystem is fed back to all other subsystems besides itself. 




Figure 2: Block diagram of N local control subsystems. 

As for a local control subsystem about a particle Xi, the controller Ci gets 
its own as well as other particles' current positions by receiving feedbacks of all 
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subsystems, and then computes the distances between it and all other particles 
to form a distance vector Di{t+l)ixn by means of the selected distance function 
d : X X X — > M which satisfies the closer the two particles are, the smaller the 
output of the function is. 



A(t + i)i> 



d(^Xi{t),Xj{t)) 



,J = 1,2,- 



(3) 



Ixn 



Based on the distance vector Di{t+l)ixn and setting an interaction range R for 
each particle, the adjacent vector Li{t+l)ixn of a particle Xi may be established 
by Eq. 4, which indicates how many particles are within its R neighborhood. 



ii(t+l)ixn= {^y(i + l),i = l,2,--- ,iv} 

1 i{d(^Xi{t),Xj{t)^ < R 
otherwise 



(4) 



Furthermore, the neighbor set ri{t + 1) and the degree of the particle are 
produced according to the adjacent vector Li{t + l)ixn' 



ri(t + i) = {j| if iijit + i) = i} 

m + 1) = Ejer,(t+i) hi{t + 1) = Ti{t + 1) 



(5) 



Here, the symbol | • | represents the cardinality of a set. Thus, the distance matrix 
D{t+ l)„xn) adjacent matrix L{t+ l)nxn and the degree vector K{t+ l)„xi of 
the whole system take forms as following: 



D{t + l)nxn = [A(i + l)lxn 
L{t+l)nxn= Li{t + l)ixn 
K{t+1) 

nxn — 

\Ki{t + l) 



,i=l,2,. 



(6) 



Before the particle Xi walks, the transition probabilities of which moves to 
all neighbors in its neighborhood need to be computed firstly as below: 



{p^J{t+l),J = l,2,■■■ ,n} 

^ — if7er,(i + i) 



otherwise 



(7) 



aijit + 1) 
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where Kj{0) and d{Xi{0), Xj{0)) are the initial degree of the particle Xi and 
the initial distance between two particles Xi and Xj respectively. 

For a particle in the system, it is allowed to choose only one of neighbors in its 
neighborhood as the transition direction at a time, and then takes a step toward 
this neighbor. Eventually, which neighbor is selected as the transition direction 
by the particle Xi? This depends on an event-generating function Gj(-, ■), which 
is a function of transition probability vector Pi{t + l)ixn and distance vector 
Di{t+l)ixn of the particle Xi. Before using the event-generating function, a set 
of events of the particle Xi, Evei{t + l)ixn = {eveij,j = 1, 2, ■ • • , N} need to 
be built at first, whose each clement, an event evcik, k G Ti{t + 1), corresponds 
to a neighbor in its neighborhood. The event generated by the event-generating 
function indicates an event in the event set Evei{t + 1) takes place. 

Eve^{t + l)ixn = {eve,j{t + l),j = 1,2,-- - ,iv} 

evei,it + l) = l ifG.(p.(t + f),A(t + l))=fc, (8) 

k € r,(t + 1) 

I eveij{t + 1) = jeri{t + l)\k 

It is worth noting that the event-generating function Gi{-,-) generates only one 
event at a time, that is, only one corresponding event in the event set Evei{t + 
l)ixn occurs exactly. As such, the particle Xi will take a step toward the 
neighbor Xk, and the walking length uJi{t + l) is proportional to the transition 
probability pik{t -|- 1), as is expressed below: 

iJi{t + 1) = pikit + 1) X d(^Xi{t), Xfe(i)) 

= (j'i{t + l)ixn X {Evei{t + l)ixn^ ^ (9) 

X ^Di{t + l)ix„ X (^Evei{t + l)ixn) 

After the particle Xi walks, its position will be updated by means of the fol- 
lowing equation: 

Xi{t + l)lxm = -X'i(i)lxm + {Evei{t + l)lxn X ^{t)nxm — -X'i(t)lxm) 

XiJi{t + l)/d(^Xi{t),Xk{t)) 
= Xi{t)ixm + {Evei{t + l)ixn X X {t)nxm - ^i(t)lxm) X Pik{t + 1) 

(10) 

When all particles in the system have walked, their positions will be recomputed 
synchronously, as means that an iteration of the modified model is completed. 



4 Convergence of modified model 

In this section, we attempt to discuss the convergence of the modified model of 
random walk. As for the system with A'' particles walking randomly governed by 
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the modified model, they will walk, collect and form several separating clusters 
eventually, if the convergence of the model holds. At first, we introduce Theorem 
1 [Hj, which specifies the absorbing probability of a particle A at the origin, 
when it walks at random on a line. 

Theorem 1 Particle A is a particle walking randomly on a line. A(n) rep- 
resents the position of Particle A at moment n, {A{n),n = 0, 1,2,---}, and 
A{0) is the initial position of Particle A, A{0) — 1,1 — 0,1,2, ■■■ . Its transition 
probabilities are defined as following, 



Pi. = < 



' Pi 


> 


if i = 1 + 1 


ri 


> 


ifi = l 


qi 


> 


if i = 1-1 




V 




otherwise 



(11) 



Position 0, the origin, is an absorbing status, that is, if the current position of 
Particle A lies at the origin, then the walk is stopped. Further, if the transition 
probabilities are constants, then the probability of Particle A absorbed at Position 
is: 

{q/pY ifp>q 



fl,0 — J2i=0 fi,0 — 



1 lfp<q (12) 



/o.o = 1 

Proof 1 see reference ll4h 

According to Theorem 1, if the transition probability of Particle A walking 
to the origin is larger than that of moving conversely, or the probabilities of 
moving in two opposite directions are equal, then Particle A will be absorbed at 
the origin with probability one. On the other hand, if the transition probability 
of moving away from the origin is larger than the other transition probability, 
the absorbing probabihty drops to {q/pf. Further, let's regard two extreme 
cases: (a) when the transition probability of moving away from the origin is 
much larger than the other transition probability, p^ q; ov (b) when the initial 
position of Particle A approaches infinite and p > q, the limit of absorbing 
probability of Particle A is zero, as is expressed below: 

(a)/,,o = hm = or (6)/,,o = lim (-)' = (13) 

q/p^O \p/ l^oo \pj 

Theorem 2 Particle A and B are two particles walking randomly on a line. 
A[n) and B{n) represent the positions of Particle A and B at moment n re- 
spectively, {A{n),n = 0,1,2,- ■ ■}, {B{n),n = 0,1,2, ■■ ■}. A{0) and B{0) are 
the initial positions of Particle A and B, A{0) = j, -8(0) = k,j,k = 0,1,2, - ■■ . 
Their transition probabilities are defined as below, 

p'^>0 ifi = i + l rPfe>0 ifi = k+l 

9">0 */i = J-l ^PL=1 <lk>0 ifi = k-l (14) 
otherwise I otherwise 
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Hence, the encounter probability of Particle A and B is: 




1 ifqpi<P]Qi 



(15) 



Proof 2 when Particle A encounters Particle B at certain position on a line, 

the distance between them, must be zero. Therefore, at first we define a variable 
Z{n), which represents the distance between Particle A and B, {Z{n) = A{n) — 
B{n),n = 0, 1, 2, ■ • • }. At the beginning, the distance between Particle A and B 
is Z(0) = k — j = 1,1 = 2, A, ■ ■ • , 2n, ■ ■ ■ , whose value must be an even. This is 
because at each step both Particle A and B move a unit, if the initial distance is 
an odd, then no matter how particles walk and how long the time is taken, they 
won't encounter on this line, i.e., Zii) 7^ 0, i = 1, 2, • • • . To solve this problem, 
we introduce a new variable r] = 1. If the distance between these two particles is 
less than or equal to t], we deem they encounter at that time. Thus, the initial 
distance I may be an even or an odd. 

If considering the sequence {Z{n), n ~ 0,1,2, ■■ ■} as a position sequence of 
Particle Z, then transition probabilities of Particle Z are associated with the 
transition probabilities of Particle A and B, which take the form as below, 



Therefore, the sequence Z(n) is a random walk on a line. As such, the encounter 
probability of Particle A and B is equal to the probability of Particle Z absorbed 
at the origin. So, according to Theorem 1, the absorbing probability of Particle 
Z, i.e., the encounter probability of Particle A andB, is: 



The Theorem is proved. 

According to our modified model, a particle Xi only can take one step uii 
to one of its neighbors in its neighbor set Ti{t + 1) at a time. If the particle Xi 
has taken a step to a particle X j which is one of its neighbors, and the particle 
X j walks to Xi as well or stays still, then the distance between them decreases. 
Applying Eq. 7 to computing the transition probabilities of the particle Xi 
and X j respectively, we can see that their transition probabilities moving to 
each other, pij and pji, both increase, if their degrees Ki and Kj change slowly 
enough. During iterations of the model, if this transition process always occurs, 
according to Theorem 2, the particle Xi and Xj will encounter with probability 




< PjPi + qjqk 



ifi = l-2 
ifi = l 
if i = 1 + 2 



(16) 
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one, because the inequality, PikPih < PijPji, k € ri(t + 1), /i e Yj{t + 1), always 
holds. 

On the other hand, if the particle Xi walks to Xj, the distances between 

the particle Xi and its other neighbors will rise at the same time. As such, the 
transition probabilities walking to other neighbors will drop too, assuming the 
degrees still change slowly enough. Simultaneously, if the particle Xk within the 
neighbor set of the particle Xi is moving away from the particle Xi, then the 
probabilities approaching each other will decrease further according to Eq. 7. 
If this process is carried to the extreme, the distance between the particles X^ 
and Xk will increase largely, while their transition probabilities moving to one 
another will decrease shapely. In this case, the limit of encounter probability is; 



(".■ PijPkh > PikPki),j G ^i{t +l),he rj{t + 1) 

As analyzed above, for the system with N particles governed by the modified 
model of random walk, some particles will be close to each other, whereas others 
will be away from one another. Thanks to the convergence of the modified 
model, this explains why particles in the particle set X can gather together 
and establish several separating clusters at last. Although the real processes 
of motions of particles are far more complex than those simple cases analyzed 
previously. Theorem 2 still provides a way to analyze and explain the results 
obtained by the modified model. 

5 Application to clustering 

In the section, at first two clustering algorithms based on the modified model 
of random walk are constructed, one of which is a deterministic random walk 
based clustering algorithm (RWl), and the other is a nondeterministic clustering 
algorithm (RW2). And then two clustering algorithms are analyzed in detail. 

5.1 Algorithms 

Assume an unlabeled dataset X = {Xi, X2, ■ ' ' ^ ^n}, whose each instance is 
with m features. In the two clustering algorithms based on the modified model, 

each data point in the dataset is regarded as a movable particle which can walk 
in the whole space at random and obey the rules given by the modified model 
of random walk. 

After selecting a similarity (or distance) function d : X x X — > M, one 
can compute the distance matrix D{t + l)nxm adjacent matrix L{t + l)nxn 
and transition probability matrix P{t + l)nxn step by step according to the 
modified model. However, if the event-generating function Gi{-, •) is chosen dif- 
ferently, the transition direction and distance of a particle Xi will be diverse. 
As a consequence, the clustering results may be changed largely depending 
on different event-generating functions Gi{-,-). Thus, we design two different 




= 



(17) 
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cvcnt-gcncrating functions and then construct two clustering algorithms: (a) a 
deterministic random walk based clustering algorithm (RWl) and (b) a nonde- 
terministic random walk based clustering algorithm (RW2). 

5.1.1 Deterministic clustering algorithm (RWl) 

The algorithm RWl using a deterministic event-generating function (•, •) has 
no randomness, i.e., under the same conditions, no matter how many times 
the algorithm is run, the clustering results obtained are unchanged. The event 
that is produced by the event-generating function G]{-, •) satisfies the following 
equation: 

Gj{Pi{t+l),Di{t+l)) = k 

= argmax (Pi{t {Pij,j G a}) 

kGr,{t+l)\a (18) 

a = |d(x,(i), < e,j e r,(f + 1)| 

where is a collision-avoiding threshold, which indicates the minimal distance 
between two data points. 

The event generated by event-generating function Gl{-, •) represents that an 
event eveik in event set Evei{t+l)ixn of a data point Xi occurs, namely evcik = 
1, which means the neighbor is chosen as the transition direction that is with 
maximal transition probability and satisfies the inequality d{Xi{t), X j{t)) > 
0. Then, the transition distance is computed by Eq. 9. When corresponding 
events of all data points are produced, the event matrix of the system is formed, 
Eve{t + l)nxn = [Evei{t + l)ixn]i=i,2,- - ,Af- Finally, the new position of the 
data point Xi is updated by means of Eq. 10. After one iteration, the sum of 
transition distances of all data points is computed, X]"=i '^i- If is less than a 
preset threshold s, all data points stop walking and the algorithm is end. 

5.1.2 Nondeterministic clustering algorithm (RW2) 

The nondeterministic algorithm based on the modified model is characterized by 
its randomness, which means that wider areas may be explored, and as a result 
better solutions may be found. In algorithm RW2, the event-generating func- 
tion with uncertainty is employed to establish the nondeterministic clustering 
algorithm. 

As same as algorithm RWl, the distance matrix D{t+l)nxn, adjacent matrix 
L{t+ l)nxn and transition probability matrix P{t + l)nxn need to be computed 
firstly in terms of the modified model. Next, as for each data point Xi, a biased 
dice with |ri(t-|- 1)| faces is applied, whose every face corresponds to a neighbor 
in its neighbor set. The bias denotes that the probability of every face appearing 
equals to an nonzero element in the transition probability vector Pii^t -t- l)ixn* 
After playing this dice Ki{t + 1) times, the results are recorded in a vector 
hi{t+ 1). In practice, the method that we use to simulate the process of playing 
dice is to divide an interval [0,1] into \Ti{t + 1)| subintervals, each of which 
corresponds to a face of the dice, i.e., a neighbor in the neighbor set of the data 
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point Xi. The length of each subinterval is equal to a transition probability, 
length{g,g+l) = p,j{t + l),g = 0,1,-- - , |r,(i + 1)|, j e T,{t + 1). Finally, 
Ki{t + 1) random numbers between zero and one are generated according to 
uniformly distribution, and the number of falling to each subinterval is recorded 
into the vector hi{t + 1). 

As for the event-generating function Gf{-,-) applied in algorithm RW2, the 
event generated by it satisfies the below equation, 

GUP,it+l),D,{t+l)) = k 

= argmax + 1) \ {/ij, j G /?}) 

fceri(t+i)\/3 (19) 

(3 = <0,je m + 1)| 

In other words, the corresponding event evcik = 1 in the event vector EvCi (t + 
l)ixn takes place, i.e., the neighbor Xk is chosen, where the distance between 
the data point Xi and Xk is larger than the threshold 9 and the falling number 
is largest. Next, the data point Xi takes a step uji to the data point X k, and 
then its position is updated by Eq. 10. Similarly, when the sum of transition 
distances of all data points is less than a threshold e, J27=i ^ ^' algorithm 
exits. 

The steps of Algorithm RWl and Algorithm RW2 are summarized in Table[TJ 



Table 1: Steps of clustering algorithm. 
Select a distance function •) 
Initialization: 
Set interaction range R 

Compute initial similarity(distance) matrix -D(0)„xn = [d{^i{^)T ^ j{^))\i,j=i,2,--- ,n 
Compute initial adjacent matrix -/j(0)„xn 
Compute initial degree vector K{Q)ny i 
Repeat: 

Compute current similarity(distance) matrix D{t + l)„xn using Eq. 3 

Produce current adjacent matrix L{t + l)„xn based on D{t + l)nxn using Eq. 4 

Compute current neighbor set of each data point Ti{t) and degree vector K{t + l)nxi using Eq. 5 

For each data points Xi 

Compute transition probability vector + l)ixn by means of Eq. 7 

RWl: Generate an event using Eq. 18 

RW2: Generate an event using Eq. 19 

Identify the transition direction of each data point 

Produce event vector Evei{t + l)ixn using Eq. 8 

Compute transition distance uji using Eq. 9 

End For 

Update positions of all points by means of Eq. 10 

Until X]r=i < g 
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5.2 Analysis of algorithm 



In the proposed clustering algorithms, there are two most important things: (a) 
the computation of transition probability matrix P„xn and (b) the design of 
the event-generating function Gi{-, ■). Applying different formulations to com- 
puting the transition probability matrix or designing different event-generating 
functions, one can establish various algorithms. As for the problem of data 
clustering, however, the computation of transition probability matrix, generally 
speaking, is associated with the similarity (or distance) matrix Dnxn- For ex- 
ample, only considering the distances among data points, the formulation for 
computing the transition probability may be written as: 



When all data points begin to walk according to the rules in the modified 
model, the new positions of data points as compared to their initial positions 
may be largely changed, in particular for those data points on the boundary of 
classes. For instance, a boundary point Xi that belongs to a class yi may appear 
near a data point Xj that belongs to another class t/2, because of choosing a 
wrong transition direction. As is shown in Fig. [Sj^a), there are two classes, t/i 
and 1/2, and two boundary points, Xi and Xj, that belong to two different 
classes. If the transition probability matrix is computed by means of Eq. 20, 
the initial transition probabilities of the data point are shown in Fig. E^b), 
in which the denominator of a number represents the distance between two 
data points, and the numerator represents the transition probability. When the 
data point Xi takes a step to Xj, and becomes closer to the point Xj, the 
transition probabilities will increase further when recomputed by Eq. 20, shown 
in Fig. [3l^c). Thus, it is predicted that data points Xi and Xj would encounter 
at last, and the data point Xi would be clustered wrongly as a point in the 
class 2/2- 

Analyzing this process carefully, we can find that this is because the com- 
putation of transition probabilities only depends on the distances between data 
points. Once the distance between two data points decreases, at the same 
time the probabilities attracting each other increase. It is worse that the pro- 
cess of encounter of two data points is accelerated owing to the positive feed- 
back. This phenomenon is also consistent with the fact that is described by 
Theorem 2. According to Theorem 2, when the product of probabilities of 
moving to one another is larger than that of moving away from each other, 
PikPih < PijPji,k 6 T i{t -\-l),h £ ^ j{t + 1), thc cucountcr probability of the 
data points Xi and Xj will be one, f^(^x-{t) x {t}) o ~ Hence, wc can draw a 
conclusion that points tend to walk to their neighbors with minimal distances, 
when choosing Eq. 20 as thc formulation of computing transition probability 



P{t + 1) 



[Py(i + l)],;j = l,2,-,W 





otherwise 



(20) 
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Figure 3: Comparison of two methods for computing transition probabilities, 
matrix. 

To avoid those problems mentioned above, we introduce a new formulation 
Eq. 7 to compute the transition probability matrix in the modified model, which 
is associated with not only the current distance matrix D(t + l)„xn, but also 
the current degree vector K{t+ l)nxi, the initial distance matrix D{0)nxn and 
initial degree vector K{0)nxi. The degree of a data point Xi describes the 
number of its neighbors in its neighborhood, and also reflects the distribution of 
density around the data point Xi. As a general rule, a data point with a large 
degree lies in an area of high density, which indicates this point is perhaps a 
central point; on the other hand, a point with a small degree may be a boundary 
point because of low density. 

Reanalyzing the above example with the same settings, further we assume 
the degrees of data points Xi and Xi are Ki = Kj = 5, and the degrees of 
other points are seven. This is logical, since other points are inner points within 
a class, they could have higher degrees than boundary points X, and X j. Now, 
we recalculate the transition probability of the data point Xi walking to its 
neighbors by means of Eq. 7, and the results are shown in Fig. Efd). From 
Fig.[3{d), we can see that the transition probability is smallest, although the 
distance between them is nearest, whereas this probability is largest according 
to Eq. 20. Even if the data point Xi takes a step to Xj wrongly, and becomes 
closer to the point X j , its transition probability pij is still smallest by means 
of Eq. 7, although it is bigger than its initial value, as is shown in Fig.[3I^e). 

6 Experiments and discussions 

To evaluate these two clustering algorithms, we choose five datasets from UCI 
repository |15| . which are Soybean, Iris, Wine, Ionosphere and Breast cancer 
Wisconsin datasets, and complete all experiments on them. In this section, 
firstly these datasets are introduced briefly, and then the effects of several im- 
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portant parameters are discussed. Finally, experimental results of algorithms 
are illustrated. 

6.1 Experiment setup 

The original data points in above datasets all are scattered in high dimensional 
spaces spanned by their features, where the description of all test datasets is 
summarized in Table [21 As for Breast dataset, those lost features are replaced 
by random numbers. Finally, this algorithm is coded in Matlab 6.5. 



Table 2: Description of datasets. 



Dataset 


Instances 


Features 


Classes 


Soybean 


47 


21 


4 


Iris 


150 


4 


3 


Wine 


178 


13 


3 


Ionosphere 


351 


32 


2 


Breast 


699 


9 


2 



Throughout all experiments, data points in a dataset are considered as par- 
ticles which can walk randomly in the whole space and whose initial positions 
are taken from the dataset. The similarity (or distance) measure of data points 
depends on the selected similarity (or distance) function d{-,-), which satisfies 
the condition that the more similar data points are, the smaller the output of 
the function is. In experiments, the similarity (or distance) function is chosen 
as following: 

d[x,{t),Xj{t)^ ^ exp[\\X,{t) - Xj{t)\\/2a^yi,j ^ 1,2,- ■ ■ ,N (21) 

where the symbol || • || represents i2-norm. The advantages of this function are 
that it not only satisfies our requirements, but also it overcomes the drawbacks 
of Euclidean distance, for instance, when two points are too close, the output of 
Euclidean distance function approaches zero. However, in the modified model, 
transition probabilities are conversely proportional to the distances between 
data points according to Eq. 7. If the distance between two data points is so 
small that the reciprocal of the output of Euclidean distance function approaches 
infinite, the computation of probabilities will fail. Nevertheless, when Eq. 21 is 
selected as the distance function, it is more convenient to compute the transition 
probabilities, since its minimum is one and the reciprocals of its output are 
between zero and one, 1 / d{X i(t) , X j(t)) G [0, 1]. In addition, the parameter a 
in Eq. 21 and the collision- avoiding threshold 9 are set one and 1.1 respectively. 

Another important parameter is the interaction range i?, which indicates 
the radius of neighborhood of a data point. For different datasets, it is quite 
difficult to preset a proper interaction range R directly, because the relations 
of data points in different datasets are various. To simplify this problem, in 
experiments, we introduce a new variable b to determine the interaction range 
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R indirectly. The method is as follows. At first, the initial distance matrix 
D{0)nxn is sorted ascendingly by rows. And then the interaction range R is set 
to the median of the 6-th column of the distance matrix sorted, which can be 
expressed as below: 

R^ median(^sort{D{0)nxn) X [0---0 I 0---0]^^^) (22) 

As such, the magnitude of the interaction range R may be adjusted by the 
variable b conveniently. For example, if a big interaction range is needed, one 
can set a big b, vice versa. 

6.2 Effects of parameters 

6.2.1 Number of clusters vs. interaction range R 

As is known, the interaction range R or the variable b controls the radius of 
neighborhood of a data point. For a dataset, the number of clusters depends 
on the interaction range R partly. Generally speaking, with the increase of 
the interaction range R, the number of clusters decreases. For example, when 
setting a small b, the number of neighbors in the neighborhood of a data point 
Xi is small, as makes the optional transition directions reduce as well. Further, 
considering covers of a graph, we can find that neighborhoods of data points 
intersect each other slightly, so that the connected domain formed is small. Even 
if the data point Xi walks in space, it only observes a small area. In this case, 
it gathers together only with its not-too-distance data points around it. As a 
consequence, all data points form many small clusters at last, as is shown in 
Fig. Ha). 

On the other hand, if a big b is selected, the data point Xi can observe a 
wider area because of a big interaction range i?, at the same time that there are 
more neighbors in its neighborhood. Thus, the neighborhoods of data points 
intersect a lot and form bigger connected domains. Hence, in the end they 
establish several big clusters. For the same dataset. Fig. 2] exhibits the relation- 
ship between the number of clusters and the interaction range R. As analyzed 
above, when the variable b = 10, six clusters are formed; three clusters are es- 
tablished when b = 25. Therefore, in the case that the exact number of clusters 
is unknown in advance, one can adjust the interaction range R or the variable 
b to obtain different number of clusters according to practical situations. 

6.2.2 Rate of convergence vs. the interaction range 

The rate of convergence of the proposed clustering algorithms is associated 
with the interaction range R or the variable b closely. As is known, the bigger 
the interaction range R is, the more neighbors are. In this case, according 
to Eq. 7, the transition probabilities of a data point Xi become smaller than 
those when setting a small interaction range R. Again, the transition distance is 
proportional to the transition probability, tUi oc pij , so it drops with the decrease 
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(a)fe = 10 {h)b = 18 (c)6 = 25 



Figure 4: Number of clusters with the different interaction range R or the 
variable b. 



of transition probability as well. Thus, the rate of convergence reduces, but this 
also makes that data points have more chances to explore in wider space and 
contact with more other data points. 

On the other hand, if setting a small interaction range R, the transition 
probabilities increase, at the same time that the transition distance increases 
too. In this case, the rate of convergence grows. However, the relationship 
between the rate of convergence and the interaction range R needs a trade-off 
in order to avoid a lack of exploration or slow convergence. For the same dataset, 
the comparison of the rates of convergence about two clustering algorithms, RWl 
and RW2, is shown in Fig. 5 with different b. Every dot in Fig. [5] represents the 
sum of transition distances of all data points X^ILi ^-ftor one walking, while 
every dot in Fig. ^h) is an average of results running RW2 ten times. 



(a)Algorithm RWl (b)Algorithm RW2 

Figure 5: Comparison of the rate of convergence of two clustering algorithms. 

From Fig. 5, we can see that after first walking the sum of transition dis- 
tances, when & = 5, is much larger than the sum when b = 25, which means 
that the smaller the interaction range R is, the bigger the sum of transition 
distances of data points is. Meanwhile, the rate of convergence decreases with 
the increase of b, but the differences are slight. Besides, the sum of transition 
distances of Algorithm RW2 is smaller than that of Algorithm RWl with the 
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same b, after first walking, and witii the same number of iterations, Algoritlim 
RW2 converges to a smaller value, as means in part that the exploring capacity 
of Algorithm RW2 is better than that of Algorithm RWl. 



6.3 Experimental results 

We have applied the two algorithms, RWl and RW2, to above-mentioned five 
datasets from UCI repository. For each dataset, RWl and RW2 are run several 
times to get the results at different b. For a same dataset, as is known, the 
number of clusters decreases with the increase of b. With a small b, it is possible 
that the number of clusters is larger than the preset number of clusters in 
the dataset, after the algorithm is end. So a merging-subroutine is called to 
merge unwanted clusters, which works in this way. At first, the cluster with the 
fewest data points is identified, and then is merged to the cluster whose distance 
between their centroids is smallest. This subroutine is repeated till the number 
of clusters is equal to the preset number. Finally, the results obtained by the 
algorithm are represented by the clustering accuracy, which is defined as below: 

Definition 1 clusteri is the label which is assigned to a data point Xi in a 
dataset by the algorithm, and Ci is the actual label of the data point Xi in the 
dataset. So the clustering accuracy is JlOj/ : 



where the mapping function map{-) maps the label got by the algorithm to the 
actual label. 

Fig. [B{a)-(e) demonstrate the results achieved by Algorithm RWl and RW2 
on the five datasets respectively, in which every dot represents a clustering 
accuracy. Since Algorithm RW2 is with uncertainty, for each dataset Algorithm 
RW2 is run twenty times with the same 6, and the mean and variance of those 
results are drawn in each figure using error bars. In addition, for each b, the 
maximum in twenty results also appears in the figure. 

As is shown in Fig. [Sla)-(e), for the same dataset, the two algorithms get 
similar results at a same 6, but the maximum obtained by Algorithm RW2 
is much better than that of Algorithm RWl. As mentioned above, the very 
randomness of Algorithm RW2 may be responsible for getting better results, as 
also means data points have explored in wider areas. 

We compare our results to those obtained by other clustering algorithms, for 
example, Kmeans 16 , PCA-Kmeans[16] and LDA-Km[TB]. The comparison is 
summarized in Tabled 



accuracy = 




N 



(23) 
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(a) Soybean dataset (b) Iris dataset (c)wine dataset 




(d) Ionosphere dataset (e)breast dataset 

Figure 6: Comparison of clustering accuracies of two proposed algorithms. 



Tabic 3: 


Comparison of clustering accuracies of alj 


jorithm. 




Algorithm 


Soybean 


Iris 


Wine 


Ionosphere 


Breast 


RWl 


89.36% 


90.67% 


96.63% 


69.52% 


96.14% 


RW2 


79.57±1.88 


90.48±0.31 


94.05±1.25 


68.86±1.03% 


95.87±0.21% 


MAX in RW2 


85.11 


90.67 


96.63 


70.37 


96.28 


Kmeans 


68.1% 


89.3% 


70.2% 


71% 




PCA-Kmeans 


72.3% 


88.7% 


70.2% 


71% 




LDA-Km 


76.6% 


98% 


82.6% 


71.2% 
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7 Conclusion 



We have introduced a modified model of random walk, and developed two clus- 
tering algorithms based on it: (a) the deterministic random walk based cluster- 
ing algorithm (RWl) and (b) the nondeterministic random walk based clustering 
algorithm (RW2). In those algorithms, data points in a dataset are considered 
as particles which can move randomly in the whole space. Initially, the sum of 
transition distances of data points is large, while the sum approaches a stable 
value when the clusters are formed gradually. If the sum is less than a preset 
threshold e, the algorithm exits. 

The modified model of random walk provides a heuristic for clustering data 
points. As a whole, data points tend to approach those data points with large 
degrees and near distances in terms of the modified model. At last, data points 
belonging to the same class are close to each other, and form tight clusters, 
while the different clusters are away from one another. If the number of clusters 
is unknown exactly in advance, one can adjust the interaction range R or the 
variable b to control the number of clusters, which decreases with the increases 
of the interaction range R or the variable h. For the same interaction range R, 
the rates of convergence of two algorithms are fast, and Algorithm RW2 seems 
more exploratory. 

According to Theorem 2, applying the modified model to a system with N 
particles, some particles are close to each other, while others are away from one 
another. In the end, several separating clusters are formed naturally. Further, 
we evaluate the clustering algorithms on five real datasets, experimental results 
are consistent with the conclusion of Theorem 2, and data points in datasets 
are clustered reasonably and efficiently. In conclusion, the proposed algorithms 
can detect clusters with arbitrary shape, size and density. 
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